Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM v0.14.0 #2401

Merged
merged 1 commit into from
Nov 1, 2024
Merged

Update TensorRT-LLM v0.14.0 #2401

merged 1 commit into from
Nov 1, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Nov 1, 2024

TensorRT-LLM Release 0.14.0

Key Features and Enhancements

  • Enhanced the LLM class in the LLM API.
    • Added support for calibration with offline dataset.
    • Added support for Mamba2.
    • Added support for finish_reason and stop_reason.
  • Added FP8 support for CodeLlama.
  • Added __repr__ methods for class Module, thanks to the contribution from @1ytic in Add module __repr__ methods #2191.
  • Added BFloat16 support for fused gated MLP.
  • Updated ReDrafter beam search logic to match Apple ReDrafter v1.1.
  • Improved customAllReduce performance.
  • Draft model now can copy logits directly over MPI to the target model's process in orchestrator mode. This fast logits copy reduces the delay between draft token generation and the beginning of target model inference.
  • NVIDIA Volta GPU support is deprecated and will be removed in a future release.

API Changes

  • [BREAKING CHANGE] The default max_batch_size of the trtllm-build command is set to 2048.
  • [BREAKING CHANGE] Remove builder_opt from the BuildConfig class and the trtllm-build command.
  • Add logits post-processor support to the ModelRunnerCpp class.
  • Added isParticipant method to the C++ Executor API to check if the current process is a participant in the executor instance.

Model Updates

  • Added support for NemotronNas, see examples/nemotron_nas/README.md.
  • Added support for Deepseek-v1, see examples/deepseek_v1/README.md.
  • Added support for Phi-3.5 models, see examples/phi/README.md.

Fixed Issues

Infrastructure Changes

  • The dependent ModelOpt version is updated to v0.17.

Documentation

Known Issues

  • Replit Code is not supported with the transformers 4.45+

@kaiyux kaiyux changed the title Update TensorRT-LLM v0.14 Update TensorRT-LLM v0.14.0 Nov 1, 2024
@kaiyux kaiyux merged commit b088016 into rel Nov 1, 2024
@kaiyux kaiyux deleted the preview/rel branch November 1, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants